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ABSTRACT 

Knowledge of the structure of galaxy clusters is essential for an understanding of large scale structure 
in the universe, and may provide important clues to the nature of dark matter. Moreover, the shape of 
the dark matter distribution in the cluster core may offer insight into the structure formation process. 
Unfortunately, cluster cores also tend to be the site of complicated astrophysics. X-ray imaging spec- 
troscopy of relaxed clusters, a standard technique for mapping their dark matter distributions, is often 
complicated by the presence of cool components in cluster cores, and the dark matter profile one derives 
for a cluster is sensitive to assumptions made about the distribution of this component. In addition, 
fluctuations in the temperature measurements resulting from normal statistical variance can produce 
results which are unphysical. We present here a procedure for extracting the dark matter profile of a 
spherically symmetric, relaxed galaxy cluster which deals with both of these complications. We apply 
this technique to a sample of galaxy clusters observed with the Chandra X-ray Observatory, and com- 
ment on the resulting mass profiles. For some of the clusters we compare their masses with those derived 
from weak and strong gravitational measurements. 

Subject headings: X-rays : galaxies: clusters — cosmology : dark matter 



1. INTRODUCTION 

The cold dark matter (CDM) paradigm of modern cos- 
mology has enjoyed spectacular success in describing the 
formation of large-scale structure in the universe (Navarro, 
Frenk & White 1997; Moore et al. 1999b; Lahav et al. 
2001; Peacock et al. 2001). There are, however, several 
nagging inconsistencies between the results of numerical 
CDM experiments and observations. On small scales, the 
dark matter halos in dwarf and low surface brightness 
galaxies are much less cuspy than in CDM simulations 
(Burkert 1995; McGaugh & de Blok 1998; Moore et al. 
1999b). Disk galaxies produced in simulations tend to 
have inadequate masses and angular momenta (Navarro & 
Steinmetz 2000). The number of Milky Way satellites ap- 
pears to be at least an order of magnitude lower than CDM 
predictions (Kauffman, White, & Guideroni 1993; Moore 
et al. 1999a; Klypin et al. 1999). On larger scales, some 
studies (Tyson, Kochanski & DelPAntonio 1998; Smail 
et al. 2000) report galaxy clusters with central density 
profiles that are flatter than CDM predictions, although 
these are somewhat controversial (Broadhurst et al. 2000; 
Shapiro & Iliev 2000). 

The density profile of bound structures which form 
through the hierarchical assembly of smaller structures is 
usually parameterized as a power law at small scales and 
a separate power law on large scales (e.g. Jing & Suto 
(2002)): 

P(T) = (r/r s )°(l + r/r fl )7-« (1) 



The four parameters in this description are the density po 
at some fiducial radius, the inner power law index a, the 
outer power law index 7, and the scale radius r s setting 
the break between the two power laws. While it is gen- 
erally agreed that 7 = 3, the value of a has generated 
considerable debate. Simulations predict a value between 
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1.0 (Navarro, Frenk & White 1996, 1997) and 1.5 (Moore 
et al. 1999b; Fukushige & Makino 2001), roughly indepen- 
dent of halo mass and formation epoch. In nature, how- 
ever, a shows a larger variation, and is likely a function 
of halo mass. Ha rotation curves of low-surface bright- 
ness galaxies indicate density profiles which are signifi- 
cantly flatter - a ~ 0.5 - than CDM predictions (Swaters, 
Madore & Trewhella 2001; Dalcanton & Bernstein 2000; 
Bordello et al. 2003; Swaters et al. 2003). X-ray obser- 
vations of galaxy clusters generally show steeper profiles, 
however, with a ~ 1.2 (Lewis, Buote & Stocke 2002) to 
1.9 (Arabadjis, Bautz & Garmire 2002). 

These discrepancies are often ascribed to limitations of 
the astrophysics or the physics included in the simulations. 
Baryon physics, if included, may be tacked on at the con- 
clusion of a simulation according to a set of semi-analytic 
and/or empirical prescriptions. It is likely that baryon 
physics will play a significant role in the evolution of the 
central halo. Reports of a halo "entropy floor" (Ponman, 
Cannon & Navarro 1999; Lloyd-Davies, Ponman, & Can- 
non 2000) suggest non-gravitational sources of heating and 
feedback either prior to or during halo formation (Balogh, 
Babul & Patton 1999; Loewenstein 2000; Wu, Fabian & 
Nulsen 2000) which are probably baryonic in origin (see 
Mushotzky et al. (2003), however). The question then be- 
comes one of determining where baryon physics ceases to 
be important. While the inclusion of baryon astrophysics 
in sufficient detail may remedy these problems, its effects 
will require a great deal of effort to disentangle (Frenk 
2002). 

It could be, however, that the missing ingredients in 
the simulations are not all astrophysical. One possibility 
is that the initial power spectrum of the primordial fluc- 
tuations is not scale invariant. If the primordial spectral 
index of density perturbations is not precisely 1 (as is nor- 
mally assumed by appealing to standard inflationary cos- 
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mology), the formation epoch of halos may be delayed suf- 
ficiently to ameliorate the central density problem (Alam, 
Bullock & Weinberg 2002; Zcntncr & Bullock 2002). An- 
other possibility is that important dark matter particle 
physics is being overlooked, and that the assumption of 
no non-gravitational self-interactions is faulty. Proposed 
modifications of CDM include, though are not limited to, 
self-interacting dark matter (Spergel & Steinhardt 2000; 
Firmani et al. 2000), warm dark matter (Hogan & Dalcan- 
ton 2000), annihilating dark matter (Kaplinghat, Knox & 
Turner 2000), scalar field dark matter (Hu & Peebles 2000; 
Goodman 2000), and mirror matter (Mohapatra, Nussinov 
& Teplitz 2002), each of which is invoked to soften the core 
density profile. Many of these modifications will soften the 
core profile of galaxy clusters as well, in conflict with many 
X-ray determinations of mass profiles, although other as- 
trophysical processes such as the adiabatic contraction of 
core baryons (Hennawi & Ostriker 2002) may mitigate this 
effect. 

In an effort to discriminate between CDM modifica- 
tions and other astrophysical influences we are mapping 
the dark matter profiles of a large sample of galaxy clus- 
ters. Specifically, we use imaging spectroscopy from the 
Chandra X-ray Observatory (Wcisskopf et al. 2002) to de- 
termine the deprojected temperature and density profiles 
of the baryonic content of each galaxy cluster, which we 
then use to derive its dark matter profile. In this paper 
we describe our method, and apply it to a sample of low- 
and moderate-redshift clusters. We describe our spectral 
dcprojcction technique in §2; we discuss the problems in- 
volved in converting these results to a mass profile and our 
solution in §3; we examine the effects that cooling flow 
model assumptions have on our profiles, and we present 
a statistical analysis of the models and a prescription for 
choosing among them using Markov Chain Monte Carlo 
sampling in §4. Finally, we summarize our findings in §6. 
In a subsequent paper we will examine these profiles for 
their implications for large scale structure formation and 
dark matter particle properties (Arabadjis & Bautz, in 
preparation [AB]). 

2. SPECTRAL DEPROJECTION 

We begin with a Chandra imaging spectroscopic obser- 
vation of a galaxy cluster using the ACIS detector, either 
with the S3 chip or the I array. We start with a level 2 data 
set that has been processed in the usual way, filtered for 
periods of high background using the procedures described 
in the CIAO Science Threads 3 , with point sources removed 
(for details see Arabadjis, Bautz & Garmire (2002)). Af- 
ter locating the center of the projected emissivity, we lay 
down a series of adjacent, concentric annuli centered on the 
emission peak. The annular dimensions are set to include 
enough source photons (1000-2000+) to reliably determine 
the plasma temperature. RMF and ARF response matri- 
ces are constructed, as is a background spectrum from a 
region external to the outermost annulus. The spectra are 
recorded in PI format, and grouped such that there is a 
minimum of 20 counts per channel. 

Our spectral deprojection has been presented elsewhere 
(Arabadjis, Bautz & Garmire 2002), and so we just briefly 
summarize here. To derive spherical radial profiles we con- 

3 See http://asc.harvard.edu/ciao/threads/all.html 



struct a model consisting of N concentric spherical shells 
whose inner and outer radii correspond to the inner and 
outer cylindrical radii of the projected annuli in the data 
set. The volume intersection matrix V, whose elements 
Vij contain the volume of spherical shell j intersected by a 
cylindrical shell formed by the projection of annulus i, is 
used to set the linear relations between each of the normal- 
izations as specified by the binning geometry. Each shell 
i on [l,N] (1 is the innermost shell - a sphere - and N 
is the outermost shell) contains an optically thin thermal 
plasma whose emission characteristics are determined by 
the MEKAL model (Mewe, Gronenschild & van den Oord 
1986; Mewe, Lemen & van den Oord 1986; Kaastra 1992; 
Liehdal, Osterheld & Goldstein 1995) within XSPEC (Ar- 
naud 1996) using two free parameters, the temperature T 
and the normalization K. 

In some models we will allow the innermost N c shells to 
contain a second emission component at a (lower) temper- 
ature T c as a first-order treatment of the cooler plasma. 
This cool component is assumed to be in pressure equi- 
librium with the hot component; this means that the cool 
and hot components cannot both be in hydrostatic equi- 
librium. We assume that the hot component is in hydro- 
static equilibrium. We adopt this form for the cooling flow 
model for two reasons: (1) there is no evidence that the 
plasma in cooling flow cluster cores cools below about 1 
keV (Peterson et al. 2003) as in the isobaric cooling flow 
model of Mushotzky & Szymkowiak (2003), and (2) it is 
arguably the simplest adjustment that can be made to the 
uniphase model. The cool component could be arranged in 
droplets or filaments which arc replenished as they migrate 
to the unspecified sink at r = 0, effecting a hydrodynam- 
ical equilibrium. The details of the geometry of the cool 
component are unimportant since they are below our spa- 
tial resolution limit; it is only our assumption that it is 
in pressure equilbrium with the hot component which has 
observational consequences. In this study we use the pres- 
sure gradient in the core to measure the radial dependence 
of the enclosed gravitating mass. 

The number of parameters in this model is rather large. 
Each MEKAL component contains six parameters, of 
which two (T and K) are allowed to vary. Thus the N 
annuli in the data set arc modelled using N + N c emis- 
sion components. Including a Galactic absorption column 
yields a model with 6(N + N c ) + 1 parameters (although 
only a subset of these actually vary) . Each of the N annuli 
independently constrain between 1 and N of the model 
shells in the fitting, and so the complete XSPEC model 
contains N[6(N + N c ) + 1] components, 2(N + N c ) + 1 
of them variable. The current version of XSPEC admits 
models with 1000 parameters, any 100 of which can vary. 
This limits our model to N = 12 annuli for N c = (876 
parameters, 25 variable) and TV = 10 annuli for N c = 2 
(730 parameters, 25 variable). We have written a program 
which reads in the data annuli dimensions and writes out 
an XSPEC script that handles all of the data manipula- 
tion. The script initializes the model parameters, performs 
the x 2 minimization, and calculates parameter uncertain- 
ties. This software is available to the public through re- 
quests to the authors. 

Most spectral deprojection schemes rely on an "onion 
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peeling" approach (Fabian et al. 1981; Allen & Fabian 
1997; David et al. 2001; Lewis, Stocke & Buote 2002; Sun 
et al. 2003): the outermost annulus is modeled using the 
outermost spherical shell; its model parameters are then 
frozen and its emission is subtracted from all annuli in- 
terior to it. The next most outer shell is then modelled, 
the resulting model again frozen and subtracted from the 
interior, and so forth, until the entire cluster has been 
modelled. The virtue of this technique is that the number 
of model components scales as N instead of iV 2 , allow- 
ing for greater spatial detail. However, because the pa- 
rameters of each model shell are frozen and the model is 
subtracted from interior shells as if it contained zero un- 
certainty, the technique does not find the global "best fit" 
of the model parameters. In many cases, the errors quoted 
in onion-peeling analyses are the uncertainties associated 
with a single layer of the onion (David et al. 2001). Error 
estimates derived from Monte Carlo simulations are more 
reliable, limited primarily by the number of simulations 
used to estimate the uncertainties (Lewis, Stocke & Buote 
2002). In our study all of the model parameters are fit 
simultaneously; the subsequent determination of error in 
subsets of interesting variables is a true expression of the 
parameter uncertainties in the model. We note here that 
although our model appears to have many more parame- 
ters than the other techniques, in actuality the numbers 
are equivalent because of the web of linear dependences 
among the component normalizations. 



3. MASS PROFILES 

Many deprojection methods rely on analytic formulae 
for the radial run of temperature, surface brightness or 
mass, either during or after the fitting process (Allen 2001; 
David et al. 2001; Hicks et al. 2002; Pizzolato et al. 2003). 
The greatest advantage of a parametric treatment is nu- 
merical stability. In addition, it is common practice to 
smooth noisy profiles before using them in subsequent cal- 
culations. This latter technique is especially useful when 
deriving gravitating mass profiles since a numerical deriva- 
tive must be computed. A serious drawback of these ap- 
proaches is that it is difficult to quantify the effect of the 
parameterization, or the smoothing, on the results. Ad- 
ditionally, it is often difficult or impossible to propagate 
errors through to the results. 

Our non-parametric deprojection technique does not 
guarantee smooth temperature and density profiles, so 
we have devised a method whereby the mass profile is 
computed from within the error envelope of (p(r),T(r)). 
By choosing a statistically reasonable realization of the 
model, we are able to compute mass profiles which are 
not only smoother than those obtained from the uncon- 
strained (p,T) set, but which avoid the unphysical results 
that arise due to the statistical fluctuations inherent in 
measurements. Essentially, our procedure imposes phys- 
ically motivated constraints to reduce the uncertainty in 
the temperature and mass profiles, and provides a statistic 
which characterizes the reliability of the mass reconstruc- 
tion. 

The standard procedure for extracting the gravititating 
mass profile of a galaxy cluster is to insert its deprojected 
temperature and density profiles into the hydrostatic equa- 



tion (Sarazin 1988): 

^ r Gimi.pl r (dlogr dlogr) ' ^ 

Here T and p are the local (baryonic) plasma temperature 
and density, r is the spherical radius, M r is the total mass 
enclosed within r (i.e. baryons plus dark matter), and m p 
and \i are the proton mass and mean particle weight, re- 
spectively. Implicit here is the assumption that the cluster 
is supported solely by an isotropic thermal pressure gra- 
dient, i.e. that random motions greatly exceed the bulk 
rotational motion, and that the magnetic field energy den- 
sity is negligible in comparison with the thermal energy 
content of the plasma. We further assume that no recent 
merger event has caused a disruption in the pressure and 
density. Given the run of density and temperature in our 
binning scheme (r i7 p i7 and Tj, i — 1,2, ...,N), we can 
calculate Mj using a simple difference equation version of 
Equation 2: 

Mj = -An Tj ( U+1 ~ t< ~ 1 + di+1 ~ di ~' ) (3) 

\ x i+l ~ ) 

where A = k/GLim p , x { = log 10 (n), di = log 10 (p^, and 
ti = log 10 (Tj). Since the goal of this technique is to derive 
enclosed mass profiles, we define r as the outer radius of 
each shell. 

Because of measurement error a mass profile calculated 
in this way is not guaranteed to be physically reasonable. 
Even if the assumptions of spherical symmetry and hy- 
drostatic equilibrium were valid, statistical fluctuations in 
the temperature measurements could result in unphysical 
points in our derived mass profile, for example dM r /dr < 
or even M r < 0. To deal with the non- physical fluctua- 
tions, we proceed under the assumptions that all unphys- 
ical values in the mass profile are due to measurement 
uncertainty in either p or T. Specifically, to estimate the 
(p, T) profiles we impose the condition that the computed 
total gravitating mass profile is consistent with 

p(total) = p(baryons) + /?(dark matter) > for r > 

Since p(total) = j^j^r and l/4nr 2 is positive definite, 
this is equivalent to the constraint 

dM r /dr > (5) 

Combining Equation 5 with the boundary condition 
M r (0) > (e.g., allowing for the presence of an unresolved 
central object) we obtain a second constraint: 

M r (r) > (6) 

Perhaps the most natural way to impose these condi- 
tions would be to invoke them as a Bayesian prior in the 
spectral fitting procedure. Bayes' Theorem (Bayes 1763; 
Papoulis 1984) states that the probability of model M 
given data set D (the posterior distribution P(M|D)) is 
proportional to the product of the probability of that data 
set given the model (the likelihood function P(D\M)) and 
the probability of the model itself (the prior knowledge 
function P(M)). 

P(M|D) ocP(D|M) -P(M) (7) 

The model prior could be chosen to enforce the constraints 
in Equations 5 and 6. Thus for certain combinations of 
model parameters Tj, pi, we could choose a prior such that 
P(M(T i ,p i ))=0. 
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Unfortunately this approach is computationally imprac- 
tical. The difficulty lies in the fact that the relatively sim- 
ple constraints on M(r) and dM/dr lead to complicated 
(nonlinear) constraints of the coupled parameters of the 
spectral models {Ki,Ti). The problem is more tractable 
if we instead apply the Bayesian constraints after uncon- 
strained density and temperature profiles have been deter- 
mined using the spectral deprojection method described 
in §2. Thus we compute an unconstrained mass profile 
from Equation 3, and check it for consistency with the 
constraints in Equations 5 and 6. These three equations 
yield constraints on the solutions to the difference equa- 
tions which we define using Yi and Zf. 
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Physically these constraints prohibit the pressure of the 
X-ray emitting plasma from rising with radius. Using the 
model parameter error estimates as a guide, we allow the 
temperature and normalization of each model component 
to vary if either of these constraints is violated by the pro- 
file. Our goal is thus to find a new set of density and 
temperature values p and T which are as close as possible 
to the the original profiles ( "fidelity" ) but which obey the 
constraints of Equations 8 and 9 ( "physicality" ) . 

At first glance this appears to be a standard problem 
in constrained optimization. Two features of the prob- 
lem suggest than an alternate route is preferable, how- 
ever. First, as mentioned above, the constraints are non- 
linear functions of the independent variables, so the con- 
strained optimization approach is very complex. Second, 
if we exercise control over the competing interests of fi- 
delity and physicality, as incorporated in a penalty func- 
tion, rather than simply eliminating solutions which vio- 
late Equations 8 and 9, we retain the ability to modulate 
departures from the unconstrained profiles. For example, 
it may be the case that the temperature profile would need 
substantial alteration in order to satisfy the rigid imposi- 
tion of constraints 8 and 9, but that a minor violation of 
Equation 9 (say, at only one point in the mass profile) 
would allow a temperature profile of much greater fidelity. 
In this instance it is to our advantage to have the abil- 
ity to set the relative weighting between the fidelity and 
physicality terms in the penalty function. 

Implicit in our choice of a penalty function to represent 
the constraints is the assumption that our estimate of the 
location of the peak of the model probability distribution 
function P est (M(Tj, p{)), obtained by enforcing fidelity, is 
close to the actual peak of the distribution P (M(Ti, p^), 
which would obtain if the Bayesian priors were utilized 
during the spectral deprojection. This idea is shown 
schematically in Figure 1. As the figure illustrates, in the 
absence of significant small-scale structure in P(M(Ti, pi)), 
-Pest(M(Tj, p^) will ideed lie near P (M(Ti, pi)) in parame- 
ter space. We therefore cast our physicality constraints as 



terms in a cost function, and henceforth refer to the opti- 
mized solution for M as the constrained profile, with the 
understanding that it is an estimate of the fully Bayesian 
solution, and that the constraints do not rigorously for- 
bid excursion into disfavored regions of parameter space. 
Our formulation of the cost function will reflect our subjec- 
tive assessment of the relative importance of fidelity versus 
physicality, and use standard optimization algorithms to 
minimize it. 

We use the likelihood to characterize the fidelity term 
in the cost function. The probability density P at the 
vector (p,T) = (pi, p~i, p~N, 7i, T 2 , T N ), which is near 
the unconstrained profiles (p,T), is given by 

N 

P= (2n) N / 2 IJ^tJ- 1 e-^- pi)2 / 2 < e -(*«- r «)7K 

i=l 

(10) 

where we have assumed that the errors in p and T are un- 
corrected and normally distributed, characterized by u Pi 
and o~Ti i respectively. 4 In maximum likelihood methods, 
cost functions are conveniently defined as the negative log 
of the likelihood (e.g. von Mises (1964)), so we define the 
fidelity penalty function Q as 

Q = -21ogP 

= iVlog(27r) + 

N 

log a Pl + log er Ti + 
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Here Qo is a constant that depends only on the measure- 
ment errors and x 2 is the 2A r -dimcnsional variance: 
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(12) 

We will ignore Qo since it will not affect the minimization. 

We follow a similar procedure to derive the physical- 
ity terms in the cost function. We wish to incorporate 
the physicality contraints by penalizing negative values of 
Yi and Zi in Equations 8 and 9 without attempting to 
maximize them if they are positive. Consider the penalty 
function 

g(x) = (\x/x \ - x/x Q )' n + (\x/x - 1| + x/x Q - l)' n 

-ln(i)+ln(C) (77 >1) (13) 

This function has several useful properties: (1) it has a 
value of zero for < x < x ] (2) it increases rapidly for 
x < or x > xq; (3) it has a continuous first derivative 
everywhere for rj > 2 (or everywhere but x = or x = xq 
for 1 < rj < 2), making it well-suited for optimization rou- 
tines that rely on gradient information to increase their 
efficiency; and (4) h(x) = e^ 9 ^ is a well-behaved proba- 
bility distribution function. C is of order one; for arbitrary 
values of rj > 1 it can be determined numerically. g(x) and 
h(x) are shown in Figure 2. 

We employ g(x) to characterize the two physicality re- 
quirements, M r > and dM r /dr > 0, in their difference 

nl/2 

observations, and therefore that deviations in d and t are correlated: 8d = — In practice, however, this is unnecessary since the relative 
uncertainty in the temperature of a shell is enormous compared with the uncertainty in its density. 
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equation form (Equations 8 and 9). In both cases Xq is 
set to a large value that we don't expect to exceed. In the 
former case we set it to 10 18 M Q ; in the latter we can use 
the equivalent of <ilog(M r )/rflog(r) < 3, corresponding 
to the upper limit for a monotonically decreasing density 
profile. We can ignore the normalization and set K = 1 
since we will be weighting the physicality penalties against 
the fidelity penalty. The complete cost function / for the 
minimization is thus: 

JV 

f = J2[A l9 (Y l ) + A 2 g(Z l )]+ X 2 (14) 

The weights A\ and A 2 and the exponent r\ are chosen to 
determine whether the departure from the unconstrained 
profile is reasonable to achieve a physically acceptable so- 
lution. 

Conveniently, this technique has a built-in gauge of the 
fidelity the x 2 value of the constrained profiles (although 
it should be noted that it is not distributed as the clas- 
sical x 2 ). If X 2 < 1 then we consider the new profile to 
be a reasonable excursion within the profile's uncertainty 
envelope. Moreover, we identify the constrained profiles 
(p, T, M r ) as a closer approximation to the true profiles, 
under the assumption that our model is correct. Con- 
versely, x 2 3> 1 indicates that the (p, T) profiles require 
excessive alteration to produce a physically sensible mass 
profile. This could indicate that our assumptions of spher- 
ical symmetry and/or hydrostatic equilibrium are invalid, 
that there is significant spatial or spectral substructure, or 
that our thermal emission model is inadequate. Regardless 
of the root cause, the accuracy of such a profile is suspect. 

Nulscn & Bohringcr (1995) (hereafter NB95) developed 
a non-parametric approach to this problem which also 
makes use of the fact that the enclosed gravitating mass 
of a cluster must be monotonically increasing. With their 
method one obtains a series of interdependent constraints 
on the mass at each point in r. These constraints are trans- 
lated into a set of likelihood functions which are assumed 
to be independent for computational ease. These likeli- 
hood functions are jointly maximized to derive the mass 
profile. Our method is similar in that it makes use of a 
likelihood function based upon excursions within an uncer- 
tainty envelope. The methods differ significantly, however, 
in the deprojection algorithm (NB95 use onion-peeling), 
and in the enforcement of the mass constraints. In NB95, 
the mass constraints are absolute; in our method they can 
be invoked to any degree - they can be rigidly enforced, 
completely ignored, or somewhere in between. The x 2 
metric of the fidelity is a powerful tool. If the physical- 
ity weight is set to an arbitrarily large value, we force the 
profile to monotonically increase, in effect obtaining the 
result of the NB95 technique. As a biproduct we obtain 
the plasma temperature and density profiles which are re- 
quired, and the x 2 value immediately tells us how likely 
it is that the temperature and density measurements con- 
spired to achieve this condition. In cases where rigid im- 
position of the constraints yields an unacceptably high x 2 
value, one can experiment with different weights and ex- 
amine the resulting profiles to ascertain, for example, if 
the problem is due to a single outlier, or is more systemic, 
possibly indicative of a breakdown of hydrostatic equilib- 
rium. Thus the flexibility of our method with regard to the 



mass constraints allows us to extract information about 
the dynamical state of the X-ray plasma and the presence 
of statistical anomalies in the data. 

4. CHOOSING A MODEL: THE / TEST AND MARKOV 
CHAIN MONTE CARLO SAMPLING 

In the previous section we assumed that the underly- 
ing model is correct. There is some uncertainty, however, 
about the form the model should take. As mentioned pre- 
viously, the best candidates for this type of analysis are 
relaxed cooling flow clusters. These systems are currently 
not well understood, yet the way we model the cool plasma 
can have a significant effect on the resulting mass profile, 
and conclusions we may draw about the properties of dark 
matter particles or the evolution of large scale structure 
(Arabadjis, Bautz & Garmire 2002). 

A standard procedure for choosing between a simple 
model M s and a complex model M c is to utilize the F 
test (Bcvington 1969). This is done by computing the F 
value for the data set D: 

X 2 (M s |D)-x 2 (M c |D) 

X 2 (M 5 |D)/j/(M 5 ) 1 ' 

and comparing it to the standard F distribution. Here 
X 2 (M S |D) and x 2 (M c |D) are the sum of the squares of the 
error-weighted residuals in the spectroscopic least-squares 
fit to the simple and complex models, respectively, and 
i/(M s ) is the number of degrees of freedom in the simple 
model. 

Unfortunately, the standard F test is not applicable in 
this context. As Protassov et al. (2002) have pointed out, 
the standard F test is valid only in cases where the simple 
model is nested within the complex model. In the present 
case, the simple model lies on a boundary of the complex 
model. That is, the simple model is a special case of the 
complex model, with the normalization of the second emis- 
sion component in the core set to zero. This means that 
its F distribution may deviate significantly from the norm 
(see Figure 3). Instead we must construct an empirical F 
distribution by sampling the probability distribution func- 
tion of the simple model, simulating a data set for each 
sample item, and applying both models to the simulated 
data. Once our F distribution has been constructed, we 
can judge the significance of the extra emission component 
based upon the location within the distribution of the F 
value for the data (Protassov et al. 2002). 

We employ the Markov Chain Monte Carlo (MCMC) 
sampling technique Neal (1993); van Dyk et al. (2001); 
Lewis & Bridle (2002); Hobson & McLachlan (2002) to 
build a large sample of data realizations from which to 
construct an empirical F distribution. Let -P(x) repre- 
sent the posterior probability distribution function of the 
parameters x = x\, x 2 , £jv determined by fitting the 
simple model M s to a real data set D . We can sample 
-P(x) by taking a rejection-based random walk through 
the parameter space. We define a transition probability 
T(x„,x„ + i) as the probability of moving from an initial 
set of parameters x„ to a new set of parameters x„ + i. T 
depends on the value of the posterior distribution at the 
original and new parameter sets. Let g(x„, x n+ i) be an ar- 
bitrary proposal distribution; that is, the probability that 
the new proposed parameter set is x„ + i given that we are 
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presently at x„. If we accept the proposed parameter set 
with probability a, then 



a(x„, 



Vi+i 



T(x. 



(16) 



g(x n ,x„+i) 

which takes into account the odds of actually stepping to 
the new location in parameter space, g(x„, x„ + i), and the 
odds of such a transition between the two locations being 
accepted, T(x„,x„ + i). The acceptance probability a is 
calculated from 

P(x„ + i)q(x n+ i,x„ 



a(x„,x n+ i) = min 



1. 



(17) 



P(x„)q(x„,x„ + i) 

In our case we use a particular form of MCMC sam- 
pling called the Metroplis algorithm (see, e.g., Neal (1993)) 
which uses a symmetric proposal distribution function: 

g(x n ,x„+i) = g(x„+i,x n ) (18) 

This prescription for wandering through parameter 
space constitutes a Markov chain since each new param- 
eter set is chosen according to a probability distribution 
function that depends only upon the previous set of values. 
In the case of the Metropolis algorithm, it is straightfor- 
ward to show that -P(x) is an invariant distribution of the 
Markov chain. Using Equations 16 and 17 we have 

P(x„) T{ 

= g(x n ,x n+ i) min[P(x„),P(x„ + ($p) 

Making use of the symmetry of q in the Metropolis algo- 
rithm, we have 

= q(x n+ i,x„) min[P(x„),P(x„ + i)] 
= P(x„+i) q(x„ + i,x„) a(x 

n+l i x n ) 

resulting in 

P(x„) T(x„,x n+1 ) = P(x n+1 ) T(x ) (20) 

This statement of detailed balance demonstrates that P(x) 
is a stationary distribution of the Markov chain. This 
is necessary, though not sufficient, to ensure that we can 
sample P(x) directly using an appropriately selected chain 
of Monte Carlo simulations. The other necessary condi- 
tion, ergodicity, ensures that any substring of the Markov 
chain will asymptotically approach P(x) regardless of the 
initial conditions, although a derivation of this property is 
beyond the scope of this paper. For a complete discussion 
see Neal (1993). 

In many applications of MCMC sampling one pays spe- 
cial attention to the finite "burn-in" period during which 
the Markov chain equilibrates. The length of the burn-in 
phase depends upon the sensibility of the starting point, 
and the appropriateness of the scale chosen for the pro- 
posal probability distribution step. This is not a consider- 
ation in our case because we start each MCMC sample at 
the (already known) peak of the probability distribution 
function P(x). 

Given P(x) computed by the MCMC process, we can 
construct an empirical F distribution and perform an F 
test on the significance of a second emission component in 
the core. The entire procedure is as follows: 

1. Model the real data set D with M s ; call 
the best-fit parameters Xq. 

2. Use XSPEC to calculate P(D |x) (i.e., the 

likelihood). 



3. Use Bayes' Theorem to calculate P(x|D ) 

(see Equation 7). We discard all unphys- 
ical excursions in parameter space, i.e. 
where T < or p < 0. (In practice, we 
discard at the level of the model normal- 
ization, not p, but p is simply a function 
of the normalization and the binning ge- 
ometry) 

4. Create a large sample of model parame- 
ters x^ using P(x|D ) and the Metropo- 
lis algorithm form of the MCMC tech- 
nique. For each x*, compute a fake data 
set D i} including instrumental effects of 
the Chandra telescope and detectors, as 
well as counting statistics. 

5. Model each D; using both M 5 and M c . 

6. For each pair of models tabulate its F 
value given by Equation 15. 

7. Bin up the set of F values, creating an un- 

normalized histogram , and superimpose 
the F value of the original data. 

In practice this recipe is computationally intensive, not 
because of any features of the MCMC sampling per se, 
but because each of the faked spectra must be modelled 
twice. For a sample size of 1000 simulations, XSPEC must 
simulate 1000 spectra and calculate 2002 sets of best-fit 
values for the model parameters (including the original 
data). This fact leads us to simplify the method. First, 
in order to reduce the modelling time, we have adopted a 
simplified core-halo geometry. In this scheme the "core" is 
represented by a single shell (in this case a sphere), while 
the halo is represented by another shell. Thus M 5 contains 
four parameters, the temperature and density of each of 
the two shells, while M c contains six, the additional two 
parameters representing the temperature and density of 
a second cospatial emission component in the core. This 
simplification also greatly improves the numerical stability 
of the fitting procedure. The algorithm (steps 1-6 above) 
is implemented in a Tel script run within XSPEC. 

Once we have completed step 7 we can distinguish be- 
tween the models. The location of the F value of the data 
within this empirical F distribution contains information 
regarding the relative merit of M 5 and M c . We define the 
significance S of the distribution as 



jF data jv(F) dp 
J °°N(F)dF 



(21) 



The signficance S = 1 — P/, where Pf is the probability 
that the simple model constitutes the better description, 
and that the F value of the data is this large strictly by 
chance. Thus, for a one-parameter model, S = 0.68, 0.90, 
and 0.99 may be interpreted as 1-, 2- and 3-cr detections 
of the additional component. We checked the sensitiv- 
ity of this method by applying it to five simulated data 
sets with known mixtures of hot (T = 5.0 kev) and cold 
(T = 1.0 kev) X-ray plasmas in pressure equilbrium. We 
use mass ratios of M C /M H = 0.000,0.0222,0.0500,0.0857, 
and 0.133, and run 100 MCMC simulations for each. Fig- 
ure 4 shows an empirical F distribution for each of these 
cases. The F value of the data is shown with a dashed 
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vertical line. For a multiphase plasma of which only 2.2% 
is in the cold component, the detection of the multiphase 
plasma is better than 2a. At 5% and greater, the detection 
is statistically highly significant. 

5. APPLICATION TO CHANDRA CLUSTERS 

We illustrate these techniques using a X-ray observa- 
tions of a sample of bright, apparently relaxed galaxy clus- 
ters (see Table 1). Each of these clusters (except A1689) 
contains a significant amount of cooler plasma in its core, 
and is known as a classical "cooling flow cluster" since 
the core radiative cooling time is shorter than the age of 
the cluster. We prepared each archived data set as de- 
scribed in Arabadjis, Bautz & Garmire (2002) and mod- 
elled the emission using the two models described in §2. 
In the first model, each shell contains isothermal plasma 
(N c — 0), while in the other, the central two shells are also 
allowed a second (cooler) emission component (N c = 2), 
and the best-fit parameter values are obtained iterativcly 
using XSPEC. Figures 5-14 show baryon density, baryon 
temperature, enclosed spherical gravitating mass, and en- 
closed cylindrical gravitating mass profiles for each cluster 
in the sample. The left panels show N c = models; the 
right show N c = 2 model. The unconstrained profiles are 
shown as data points, with red [blue] symbols represent- 
ing the hot [cool] plasma components. (Data points which 
lie outside the ordinate range are indicated by arrows.) 
Constrained profiles are drawn as solid curves; the density 
and temperature contributions to the \ 2 value of the con- 
strained solutions are listed in Table 2. The last column 
in that table lists the MCMC significance S of the pres- 
ence of multiphase gas in each cluster core as determined 
from the MCMC-derived empirical F distributions shown 
in Figure 15. For the subset of five clusters which are 
arguably the most relaxed, and are best described by an 
NFW profile (A1689, A1835, A2029, MS1358 and MS2137; 
see AB), we overlay the projected mass profiles with weak 
and/or strong lensing measurements from the literature 
(see Table 3 for references). Weak lensing mass profiles or 
isothermal sphere fits to weak lensing data are shown in vi- 
olet; strong lensing measurements are shown in green. We 
adopt H = 67 km s" 1 Mpc -1 , Q. A = 0.7 and fl m = 0.3. 

A detailed analysis of these profiles and their conse- 
quences for cosmology and dark matter candidates is in 
preparation (AB), so we make only a few brief comments 
here. Of the ten clusters presented here, only HydraA 
did not admit a second emission component in the core 
- attempts to add one resulted in the temperature of the 
second component being set to the temperature of the first 
in the fitting procedure. This is consistent with previous 
Chandra (David et al. 2001) and XMM-Newton (Kaastra 
2004) results. Four of the clusters show evidence for mul- 
tiphase core plasma at the 99% significance level - A2029, 
A2204, MS1358 and ZW3146 (see Table 2). This is con- 
sistent with the study of Kaastra (2004), which finds evi- 
dence for multiphase plasma in many clusters. (Note that, 
with 1000 MCMC simulations per cluster, the precision of 
the significance estimate is ~3%.) For comparison, a clus- 
ter which contained no second plasma component would, 
on the average, show an MCMC significance of order 0.5, 
or 50%. These clusters show a greater difference in their 
core masses between the uni- and multiphase models. This 



phenomenon is illustrated for two clusters, MS1358 and 
MS2137, in Figures 16 and 17. The mass of MS1358 
in the multiphase model is about a factor of two larger 
in than the Uniphase mass, and its MCMC significance 
#=0.987. MS2137, on the other hand, with an MCMC 
significance of 0.271, shows very little difference between 
the two mass models. This is perhaps unsurprising, as one 
would expect that a significant amount of cospatial cool 
plasma in its core would not only display a clear observa- 
tional signature but would also affect the equilibrium con- 
figuration of the plasma. MS2137 and ZW3146 also pro- 
vide an interesting contrast. In the pre- Chandra/XMM- 
Newton era, both clusters were reported to harbor cooling 
flows with mass deposition rates in excess of 1000 M Q y _1 
(Allen 2000), yet they show remarkably disparate evidence 
for multiphase plasma in their cores (S'(MS2137)=0.271; 
S(ZW3146)=0.989). This suggests that there may be more 
than one mechanism at work responsible for the presence 
of 1 keV plasma at the center of galaxy clusters. 

For each constrained reconstruction we use r\ — 2.5 
and physicality weights A\ = A 2 = Ai 2 originally set to 
1 x 10~ 6 . While this often will not rigidly enforce con- 
straint equation 5, it is usually sufficient to enforce equa- 
tion 6. In those cases where not even equation 6 is satis- 
fied, we increased Ai 2 by factors of 10 until the resulting 
profile was nonnegative everywhere if possible. This is the 
origin of the unacceptably high values of the x 2 fidelity 
measures for the N c = 2 model of A2104 and the iV c = 1 
model of HydraA. Obviously, the fidelity of a constrained 
profile tends to be greater when the unconstrained pro- 
file shows only one or two significant outliers. In some 
cases (e.g. A1795) a very small adjustment to the temper- 
ature profile produces a large change in the derived mass. 
This is due to the competition between the derivatives in 
equation 2 - if a statistical fluctuation in the tempera- 
ture measurement is large enough (and positive) it will 
swamp the surface brightness decrement at that radius, 
resulting in a very low or even negative mass. In cases 
where dlogT/dlogr > —dlogp/dlogr, a relatively minor 
adjustment in the temperature can remove an unphysical 
point from the mass profile. 

The five very relaxed clusters generally show better 
agreement with weak lensing mass measurements than 
they do with strong lensing. The rcprojected profile A1689 
shows the least agreement with the weak lensing, differing 
by up to a factor of two at some radii, although the lensing 
profile is a singular isothermal sphere (IS) fit to weak lens- 
ing measurements (King, Clowe & Schneider 2002). The 
strong lensing measurement of Wu (2000), however, ex- 
ceeds our profile again by a factor of 2. The error bars are 
derived by assuming two values of the (unknown) strong 
lensing arc rcdshift (0.8 and 2.0); the discrepancy is thus 
unlikely to be due to incorrect source redshift. Our pro- 
file is also systematically in excess of the IS fit to weak 
lensing observations of A1835 (Clowe & Schneider 2002), 
although it is consistent with the strong lensing point of 
Allen (1998). A2029 and MS1358 show remarkable agree- 
ment with weak lensing data (Menard, Erben & Mellicr 
2003; Hoekstra et al. 1998), although the strong lensing 
measurement in MS1358 (Franx et al. 1997; Allen 1998) is 
moderately discrepant, as is the strong lensing measure- 
ment of MS2137 (Sand, Trcu & Ellis 2002). These issues 
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will be addressed in greater detail in AB. 

6. SUMMARY 

We have presented a technique for calculating the dark 
matter profile of a spherical, relaxed galaxy cluster. We 
have formulated a technique for coping with statistical un- 
certainty in the measurement of the cluster plasma tem- 
perature. We have also described a method for determin- 
ing whether there is a statistically significant presence of 
multiphase plasma in the galaxy cluster core. We have 
applied these tools to a sample of relaxed galaxy clusters 
observed with Chandra, and find that 4/10 require a mul- 
tiphase treatment of their core plasma. Our masses are in 
broad agreement with weak lensing studies, though are of- 
ten exceeded by those derived from strong lensing models. 

JSA would like to thank Steve Allen, Aaron Lewis, 
Jimmy Irwin and Renato Dupke for many enlightening 
discussions. The authors would like to thank the anony- 
mous referee for comments which have improved this pa- 
per. This work was supported by SAO Grant AR3-4016X. 
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Fig. 1. — Schematic representations of the unconstrained probability distribution function (Pjj, top), and the constrained distribution (Pc, 
bottom) for a two-dimensional parameter space. The estimated peak of the constrained distribution is shown in blue; the true peak is shown 
in violet. (For clarity we have not renormalized Pc after removing the physically disallowed region of parameter space.) 




Fig. 2. — The penalty function g(x) and the associated (approximate) probability distribution function h(x) = e 9 ^ for r/ = 7/2. Here 
h(x) nearly, but not precisely, normalized; we have set C = 1 (see Equation 13). 
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Fig. 3. — The non-nested relationship between the simple and complex models. The simple model corresponds to the complex model with 
one of the normalizations set to 0. 
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Fig. 4. — Empirical F distributions for models M s and M c of five simulated data sets. The cold to hot plasma mass ratio, assuming that 
the two are in pressure equilibrium, is shown for each case, along with the significance of the presence of the cool component as determined 
through 100 MCMC simulations (see Equation 21). The F value of the original data set is indicated by a vertical dashed line. 
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Fig. 5. — Baryon density, baryon temperature, spherically enclosed mass, and cylindrically enclosed (projected) mass profiles of A1689, for 
N c = (left panels) and N c = 2 (right panels). The hot plasma is shown in red, the cool component in blue. Arrows represent points which 
lie outside the ordinate range. Reconstructions adhering to constraint equation 6 are shown as solid black curves. Weak (violet) and strong 
(green) gravitational lensing measurements are shown for comparison in the bottom panels. (A solid violet line represents an isothermal 
sphere fit to the weak lensing data set.) See Table 3 for lensing references. 
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Fig. 6.— 



Same as Figure 5, for A1795. 
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Fig. 7.— 



Same as Figure 5, for A1835. 



17 



T 




J d 



10 50 100 10 50 100 

r (kpc) 



Fig. 8.— Same as Figure 5, for A2029. 
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Fig. 9. — Same as Figure 5, for A2104. Note that there is no reconstruction solution in the N c = 2 case which is consistent with constraint 
equations 5 and 6. 
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Fig. 10.— Same as Figure 5, for A2204. 
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Fig. 11. — Same as Figure 5, for HydraA. Note that there is no reconstruction solution in the N c = 1 case which is consistent with constraint 
equations 5 and 6, and that the Hydra A data set did not admit N c = 2 emission models. 
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Fig. 12.— Same as Figure 5, for MS1358. 
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Fig. 13.— Same as Figure 5, for MS2137. 
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Fig. 14.— Same as Figure 5, for ZW3146. 
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Fig. 15. — Empirical F distributions and the MCMC significance S of a second cospatial core plasma component for each cluster in the 
sample. The F value of the original Chandra data set is denoted with a vertical dashed line. (Note that Hydra A data did not admit an 
N c = 2 emission model.) 
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Fig. 16. — Mass profiles for models N c = (red) and N c = 2 (blue) of MS1358, overlaid for comparison. Weak (violet) and strong (green) 
gravitational lensing measurements are shown for comparison. 
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Fig. 17.— Same as Figure 16, for MS2137. 
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Table 1 



Galaxy Cluster 
Sample 



cluster 


z 


A1689 


0.181 


A1795 


0.0631 


A1835 


0.2523 


A2029 


0.0765 


A2104 


0.1554 


A2204 


0.1523 


HydraA 


0.0522 


MS1358 


0.328 


MS2137 


0.313 


ZW3146 


0.2906 
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Table 2 

Fidelity measures of baryon density (xp) and temperature (xl*) of each constrained 

MASS PROFILE (SEE EQUATION 12), AND MULTIPHASE CORE PLASMA SIGNIFICANCE S. (NOTE 

that Hydra A data did not admit an N c = 2 emission model.) 



cluster 


Xp (Nc = 0) 


Xt (N c = 0) 


X 2 P (N c = 2) 


Xt (Nc = 2) 


S 


A1689 


0.00150 


0.467 


0.00825 


1.14 


0.342 


A1795 


0.0777 


7.45 


0.0156 


1.29 


0.823 


A1835 


0.00229 


1.13 


1.188 


1.41 


0.483 


A2029 


0.00774 


1.17 


0.00773 


1.20 


0.998 


A2104 


0.00251 


0.624 


0.190 


4.51 


0.661 


A2204 


0.000994 


0.759 


0.00947 


0.284 


0.999 


HydraA 


0.197 


2.43 








MS1358 


0.00663 


0.381 


0.00454 


0.356 


0.987 


MS2137 


0.0101 


1.08 


0.0110 


1.38 


0.271 


ZW3146 


0.0293 


1.25 


0.0150 


0.913 


0.989 



Table 3 



Lensing comparisons for the very relaxed cluster subset. 



cluster 


weak lensing 


strong lensing 


A1689 


King, Clowe & Schneider (2002) 


Wu (2000) 


A1835 


Clowe & Schneider (2002) 


Allen (1998) 


A2029 


Menard, Erben & Mellier (2003) 




MS1358 


Hoekstra et al. (1998) 


Franx et al. (1997); Allen (1998) 


MS2137 




Sand, Treu & Ellis (2002) 



